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ABSTRACT 

This paper offers a local distributed algorithm for expecta- 
tion maximization in large peer-to-peer environments. The 
algorithm can be used for a variety of well-known data min- 
ing tasks in a distributed environment such as clustering, 
anomaly detection, target tracking to name a few. This 
technology is crucial for many emerging peer-to-peer ap- 
plications for bioinformatics, astronomy, social networking, 
sensor networks and web mining. Centralizing all or some 
of the data for building global models is impractical in such 
peer-to-peer environments because of the large number of 
data sources, the asynchronous nature of the peer-to-peer 
networks, and dynamic nature of the data/network. The 
distributed algorithm we have developed in this paper is 
provably-correct i.e. it converges to the same result com- 
pared to a similar centralized algorithm and can automat- 
ically adapt to changes to the data and the network. We 
show that the communication overhead of the algorithm is 
very low due to its local nature. This monitoring algorithm 
is then used as a feedback loop to sample data from the 
network and rebuild the model when it is outdated. We 
present thorough experimental results to verify our theoret- 
ical claims. 
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H.2.4 [Database Management Systems]: Distributed 
databases; H.2.8 [Database Applications]: Data mining; 
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1. INTRODUCTION 

Expectation Maximization (EM) is a powerful statistical 
and data mining tool which can be widely used for a vari- 
ety of tasks such as clustering, estimating parameters from 
the data even in the presence of hidden variables, anomaly 
detection, target tracking and more. In 1977, Dempster et 
al. [10] presented the seminal work on EM and its appli- 
cation for estimating the parameters of a Gaussian Mixture 
Model (GMM). The authors showed that given a sample of 
data, there is a two step process which can estimate certain 
unknown parameters of the data in the presence of hidden 
variables. This is done by maximizing the log likelihood 
score of the data and assuming a generative model. Thus 
the classical EM algorithm is well-understood and produces 
satisfactory estimates of the parameters when the data is 
centralized. 

However, there exist emerging technologies in which the 
data is not located at a central location but rather dis- 
tributed across a large network of nodes or machines con- 
nected by an underlying communication infrastructure. The 
next generation Peer-to-Peer (P2P) networks such as Gnutella, 
BitTorrents, e-Mule, Kazaa, and Freenet offer some exam- 
ples. As argued on several occasions, P2P networks can no 
longer be viewed as an isolated medium of data storage and 
dissemination; recent research on P2P web community for- 
mation [24] [9], bioinformatics 1 and diagnostics [13] [32] has 
shown that interesting information can be extracted from 
the data in such networks. However data analysis in such 
environments calls for a new breed of algorithms which are 
asynchronous, highly communication efficient and scalable. 

To solve this problem, in this paper we develop a local, 
P2P distributed (PeDEM) and asynchronous algorithm for 
monitoring and subsequent reactive updating of a GMM 
model using an EM style update technique. Our algorithm 
is provably correct i.e. given all the data, our algorithm will 
converge to the same result produced by a similar centralized 
algorithm. The algorithmic framework is local, in the sense 
that the computation and communication load at each node 
is independent of the size or the number of nodes of the net- 
work. This guarantees high scalability of the algorithm pos- 
sibly to millions of nodes. The proposed methodology takes 
a two-step approach for building and maintaining GMM pa- 
rameters in P2P networks. The first step is the monitoring 
phase in which, given an arbitrary estimate of the GMM 
parameters, our local asynchronous algorithm checks if they 
are valid with respect to the current data to within user- 

1 http : / / smweb . bcgsc . be . ca/ Chinook/ index . html 



specified thresholds using different metrics, as we explain 
later. If not, this algorithm raises a flag, thereby indicat- 
ing that the parameters are out-of date. At this point, we 
employ a convergecast-broadcast technique to rebuild the 
model parameters. This step is known as the computation 
phase. The correctness of the monitoring algorithm guar- 
antees that a peer need not do anything if the flag is not 
raised — thus reducing communication and computation 
costs. When the data undergoes a change in the underlying 
distribution and the GMM parameters no longer represent 
it, the feedback loop indicates this and the parameters are 
recomputed. The specific contributions of this paper are as 
follows: 

• To the best of the authors’ knowledge this is one of 
the first attempts on developing a completely asyn- 
chronous and local algorithm for monitoring the GMM 
parameters in P2P networks using an EM style of up- 
date rule. Previous research has only performed com- 
putation of the parameters in a distributed setup. 

• Besides this direct contribution, this paper shows how 
second order statistics can be directly monitored in a 
p2p network. Previous work [36] only showed how we 
can monitor first order statistics such as the mean of 
the data. 

The rest of the paper is organized as follows. Section 2 
presents the motivation of this work. Related background 
material is presented in Section 3. Section 4 provides some 
necessary background material on the EM algorithm and 
then introduces the notations and problem definition. Sec- 
tion 5 discusses the main theorem and its application for 
developing the algorithm. The monitoring of the param- 
eters are presented in Section 6. Section 7 discusses the 
computation problem. Theoretical analysis of the algorithm 
is presented in Section 8 followed by experimental results in 
Section 9. Finally, Section 10 concludes this paper. 

2. MOTIVATION 

Monitoring models in large distributed environments can 
be done in three different ways: (1) periodic, (2) incremen- 
tal, or (3) reactive. In the periodic mode, a model is build 
at fixed intervals of time. While this approach is simple, 
one needs to come up with optimal value of the interval. 
Too small a value of the interval may unnecessarily build 
models when not needed, thereby wasting resources ( e.g . 
when data is stationary) while a longer interval may not 
update the model often in case of dynamic data. The in- 
cremental approach adjusts to the model whenever the data 
changes, thereby keeping the model up-to-date. However, 
developing incremental algorithms may be difficult. The 
third approach, what we take in this paper and also shown 
in [4] [36] [8], is to update a model only when the data no 
longer fits it. If the data is piecewise stationary, it has been 
shown that this approach may be both simple and efficient. 

The suitability of the model and the data can be checked 
using several metrics: L2-norm, y 2 , log- likelihood etc. In 
this paper we have developed a local distributed algorithm 
for monitoring the GMM parameters using (1) log-likelihood 
of the data, and (2) norm difference between the parame- 
ters. Local algorithms rely on a set of data dependent rules, 
thereby deciding when a peer can stop sending messages 
and output the result even it has communicated with only 


a handful of immediate neighbors. A peer can do nothing 
even if its data has changed as long as its local rules are 
satisfied. 

Practical scenarios in which distributed EM can 
be used 

3. RELATED WORK 

Work related to this research can be subdivided into two 
major areas — distributed EM algorithms and computation 
in large distributed systems a.k.a P2P systems. We pro- 
vide a brief exposure to each of the topics in the next two 
subsections. 

3.1 Distributed EM Algorithms 

In standard EM algorithm, the task is to estimate some 
unknown parameters from the given data in the presence 
of some unobserved or hidden variables. The seminal work 
by Dempster et al. [10] proposed an iterative technique al- 
ternating between the E-step and M-step that solves this 
estimation problem. The paper also proved the convergence 
of the EM algorithm. In the E-step, the expected value of 
assumed hidden variables are generated using an estimate of 
the unknown parameters and the data. In the M-step, the 
log- likelihood of the unknown parameters given the data and 
hidden variables (as found in the previous step) are maxi- 
mized. This two-step process is repeated until the parameter 
estimates converge. Furthermore, for gaussian mixture mod- 
els (GMM), the updates for the M-step can be written in a 
closed form as a computation of the weighted combination 
of all data points. This decomposable nature of the problem 
makes it tractable in a distributed setup. 

In the distributed setup, we need to find decentralized im- 
plementations of the E-step and M-step. Distributed imple- 
mentation of the E-step is straightforward and communication- 
free: given the estimates of the parameters at each node, 
simply evaluate the estimates of the hidden variables based 
on only its local data. So the focus of most distributed EM 
research is to efficiently compute the parameters in the M- 
step in a distributed fashion. The naive approach of simply 
aggregating all the data at a central location does not scale 
well for large asynchronous networks. Next, we discuss sev- 
eral implementations of the distributed M-step. 

In 2003, Nowak [31] proposed a distributed EM (DEM) 
algorithm with the execution of the M-step as follows. A 
ring topology is overlaid over the original network encom- 
passing all the nodes in the network. Due to this ring, the 
updates in the M-step proceed in a cyclical fashion whereby 
at iteration t, node m gets the estimates of the parameters 
from its neighbor in the ring, updates those estimates with 
its local data, and passes them to the next neighbor at clock 
tick t + 1. The paper proves that DEM monotonically con- 
verges to a local maxima and because of the incremental 
update, it converges more rapidly than the standard EM al- 
gorithm. However, this technique is not likely to scale for 
large asynchronous networks due to the strict requirement of 
the overlay ring topology covering all the nodes in the net- 
work. Moreover, the algorithm is highly synchronized, with 
each round of computation taking time which is proportional 
to the size of the network. This becomes problematic espe- 
cially if the data changes or a node fails or joins whence the 
entire computation needs to be started from scratch. 

To overcome this problem, several techniques have been 
developed. Recalling that for GMM, the updates for the M- 



step can be written as weighted averages over all the nodes’ 
data, any distributed averaging technique can be used for 
performing this computing over a large number of nodes. In 
the literature there are two basic types of distributed averag- 
ing techniques: (1) probabilistic i.e. gossip style — [17] [6] [20] 
in which a node repeatedly selects another node in the net- 
work and averages its data with the selected node and (2) 
deterministic i.e. graph- laplacian [26] or linear dynamical 
systems based [34] — in which a node repeatedly communi- 
cates with its immediate neighbors only and updates its state 
with the information it gets from all its neighbors. While 
the first class of algorithms probabilistically guarantee the 
correct result, the deterministic algorithms converge to the 
correct result asymptotically. Newscast EM [21] is the algo- 
rithm proposed by Kowalczyk and Vlassis which uses gossip- 
style distributed computation to compute the parameters of 
the M-step. At each iteration, any peer Pi selects another 
peer Pj at random and both compute the average of their 
data. It can be shown that, if the peer selection is done 
uniformly at random, any such gossip-based algorithm con- 
verges to the correct result exponentially fast. For such a 
technique to work in practice, the network must be fully con- 
nected i.e. any node must be able to select any other node 
in the network. Using deterministic averaging technique, Gu 
[12] proposed an EM algorithm for GMM. In this algorithm 
peers communicate with immediate neighbors only. At any 
timestamp t, whenever a peer Pi gets the current estimates 
from all of its immediate neighbors, it updates its own es- 
timate based on its own data and all that it has received. 
Then it moves to the next timestep t + 1 and broadcasts 
its own estimate to its immediate neighbors. This process 
continues forever. However, the major criticism for both of 
these techniques is that they are highly synchronous and 
hence not scalable for large asynchronous P2P networks. 

In this paper we take a different approach. Assuming a 
previous estimation of the parameters, we monitor if the 
parameters are still valid with respect to the current global 
data. Our algorithmic framework guarantees correct results 
(with respect to centralization) at a low communication cost. 

Several applications for distributed EM algorithms have 
also been proposed in the literature. Multi-camera tracking 
[27], acoustic source localization in sensor networks [19], dis- 
tributed multimedia indexing [30] are some of the examples. 

3.2 Data Mining in Large Distributed (P2P) 
Systems 

Based on the type of computation performed in P2P sys- 
tems, this section can be subdivided into approximate algo- 
rithms and exact algorithms. 

3.2. 1 Approximate Algorithms 

Approximate algorithms, as the name suggests, computes 
the approximate data mining results. The approximation 
can be probabilistic or deterministic. 

Probabilistic algorithms use some variations of graph ran- 
dom walk to sample data from their own partition and that 
of several neighbors’ and then build a model assuming that 
this data is representative of that of the entire set of peers. 
Examples for these algorithms include the P2P /c- Means al- 
gorithm by Banyopadhyay et al. [2], the newscast model 
by Kowalczyk et al. [20], the ordinal statistics based dis- 
tributed inner product identification for P2P networks by 
Das et al. [9], the gossip-based protocols by Kempe et al. 


[17] and Boyd et al. [6], and more. 

Researchers have proposed deterministic approximation 
technique using the method of variational approximation 
[16] [14]. Mukherjee and Kargupta [28] have proposed dis- 
tributed algorithms for inferencing in wireless sensor net- 
works. Asymptotically converging algorithms for comput- 
ing simple primitives such as mean, sum etc. have also been 
proposed by Mehyar et al. [26], and by Jelasity et al. [15]. 

3.2.2 Exact Algorithms 

In exact distributed algorithms, the result produced are 
exactly the same if all the peers were given all the data. 
They can further be subdivided into convergecast algorithms, 
flooding algorithms and local algorithms. 

Flooding algorithms, as the name suggests, flood whatever 
data is available at each node. This is very expensive espe- 
cially for large systems and more so when the data changes. 

In convergecast algorithms, the computation takes place 
over a spanning tree and the data is sent from the leaves up 
the root. Algorithms such as [33] provide generic solutions 
— suitable for the computation of multiple functions. These 
algorithms are, however, extremely synchronized. 

Local algorithms are a class of highly efficient algorithms 
developed for P2P networks. They are data dependent dis- 
tributed algorithms. However, in a distributed setup data 
dependency means that at certain conditions peer can cease 
to communicate with one another and the result is exact. 
These conditions can occur after a peer has collected the 
statistics of just few other peers. In such cases, the over- 
head of every peer becomes independent of the size of the 
network and hence, local algorithms exceptionally suitable 
for P2P networks as well as for wireless sensor networks. 

In the context of graph theory, local algorithms were used 
in the early nineties by Linial [23] and later Afek et al. [1]. 
Naor and Stockmeyer [29] asked what properties of a graph 
can be computed in constant time independent of the graph 
size. Local algorithms for P2P data mining include the ma- 
jority voting and protocol developed by Wolff and Schuster 
[37]. Based on its variants, researchers have further proposed 
more complicated algorithms: facility location [22], outlier 
detection [7], meta-classification [25], eigen vector monitor- 
ing [8] , multivariate regression [4] , decision trees [5] and the 
generic local algorithms [36]. 

Communication-efficient broadcast-based algorithms have 
been also developed for large clusters such as the one devel- 
oped by Sharfman et al. [35]. Since these algorithms rely on 
broadcasts as their mode of communication, the cost quickly 
increases with increasing system size. 

4. PRELIMINARIES 

In this section we present some background material nec- 
essary to understand the PeDEM algorithm that we have 
developed. 

4.1 Expectation Maximization 

EM [10] is an iterative optimization technique to estimate 
some unknown parameters 0 given some data U. It is also 
assumed that there are some hidden variables J. EM algo- 
rithm iteratively alternates between two steps to maximize 
the posterior probability distribution of the parameters 0 
given U: 

• E-Step: estimate the Expected value of J given 0 
and U. 



• M-Step: re-estimate 0 to Maximize the likelihood 
of U, given the estimates of J found in the previous 
E-step. 

In order to apply the above two rules for estimation, we need 
closed form expressions for the E- and M-steps. Fortunately, 
closed form expressions exist for a widely popular estimation 
problem viz. Gaussian mixture modeling (GMM). We dis- 
cuss this in details next as we will use it for the rest of the 
paper for developing our distributed algorithm. 

A multidimensional Gaussian mixture for a random vector 
~x 6 R d is defined as the weighted combination: 

k 

p(~x) = Y I s ) 

s=l 

of k gaussian densities where the s-th density is given by 

p(^\ s ) = (2 7r )rf/2 1 |c s | i /2 exp R ^ - plfcr 1 ^ - /3)/2] 

each parameterized by its mean vector pi = [p s .ips .2 ■ ■ ■ p s .d] T 
and covariance matrix C s =(x — p s )(x — ps) T ■ 7r s = p(s) 
defines a discrete probability distribution over the k compo- 
nents. Given n multi-dimensional samples X = {xT, . . . , xZ}, 
the task is to estimate the set of parameters 

0 = {Ml, - - - , Mfc, Cl, . . . , Cfc, 7Tl, . . . , 7Tfc} 

by maximizing the log likelihood of the parameters given the 
data: 

£(0|X) = log p(xl |0) = log ( 51 pi, C.) j 

0=1 0=1 \s=l / 

Using EM for GMM, the E-step and the M-step can be writ- 
ten as: 

E-step (estimate the contribution of each point): 


data tuples which have been generated from the k gaussian 
densities having unknown parameters and unknown mixing 
probabilities. The tuples are horizontally distributed over a 
large (undirected) network of machines (peers). The local 
data of peer Pi at time t is Si = [x,, i, Xip, ■ ■ ■ , Xj, m ]], where 
Xij = [xij.iXij .2 ■ ■ ■ Xij.d\ T € R d . Here m< denotes the 
number of data tuples at Pi and d denotes the dimensional- 
ity of the data. The global input at any time is the set of 
all inputs of all the peers and is denoted by Q — |^J S % . 

Our goal is to develop a framework under which each peer 
(1) checks if the current parameters of the GMM are up-to- 
date with respect to the global (all peers’) data, and (2) re- 
computes the models whenever deemed unfit. The network 
that we are dealing with can change anytime i.e. peers can 
join or leave. Moreover, the data is dynamic and is only as- 
sumed to be piecewise stationary. The proposed algorithm 
is designed to seamlessly adapt to network and data changes 
in a communication-efficient manner. 

We assume that communication among neighboring peers 
is reliable and ordered. These assumptions can be imposed 
using heartbeat mechanisms or retransmissions proposed else- 
where [11] [18] [36] [5]. Furthermore, it is assumed that data 
sent from P i to Pj is never sent back to Pi. One way of 
ensuring this is to assume that communication takes place 
over a communication tree - an assumption we make here 
(see [36] and [5] for a discussion of how this assumption can 
be accommodated or, if desired, removed). 

4.3 Problem Formulation in P2P Scenario 

When all the data is available at a central location, the 
update equations for the iterative EM algorithm are given 
by Equations 1-4. However, in the distributed setup, all the 
data is not available at a central location. Therefore, for any 
peer Pi, the log likelihood and the update equations, can be 
written as: 


n s J\f(x a -, ps, C s ) 

Sr=l VrN(xZ-, pi, Cr ) 


( 1 ) 


£(©!<?) = £5> g Y n s N{x^X', pi, C.) 


( 5 ) 


M-step (recompute the parameters): 


7 r s 


Ps 


C s 


E n n 

a= 1 

n 

X/q=l Qs,gX g 

Eo=l 9s , a 

Eo=l Qs,a(Xa P s'jiXa ~ p s) 
Ea=l 9®. a 


( 2 ) 

( 3 ) 

( 4 ) 


where N(xa\ pi, C s ) denotes the pdf of a normal distribu- 
tion with input x a , mean ji s and covariance C s . Note that 
the above computation needs to be carried out for all the k 
Gaussian components. 

In the next few sections we shift our focus to distributed 
computation of these parameters and discuss some assump- 
tions and necessary background material. 


4.2 Notations and Assumptions 

Let V = {Hi , . . . , P p } be a set of peers connected to 
one another via an underlying communication infrastruc- 
ture such that the set of Pi’s neighbors, U, is known to Pi. 
Each peer communicates with its immediate neighbors (one 
hop neighbors) only. At time t, let Q denote a collection of 


E-step: 

M-step: 
7 r s 

pi 

C s 


Qi,s,a : 


7 TgAf ^Xi ja ‘, ps, 
i TTr-A/” a’, pr , 


EL 1 Egl 1 
ELi m * 

Ej=l Ea=l Qi,s,gXi,g 

ELi ESi 

ELl Eq=l Qi t s,a{Xi ja P s){Xi,a P s) 
Z_<i= 1 Z-ia=l 9*,s,a 


( 6 ) 

( 7 ) 

( 8 ) 
( 9 ) 


Note that computation in the E-step is entirely local to a 
peer. However, for the log-likelihood and the M-step, a peer 
needs information from all the nodes in the network in order 
to recompute the parameters. In this paper, we consider a 
monitoring version of this problem: given a time-varying 
data set and pre-computed initial values of these parameters 
(build from a centralized or sampled data) to all peers, does 
these parameters describe the union of all the data held by 
all the peers? 



In other words, we focus on a monitoring and subsequent 
reactive updating of the GMM parameters. Given all the 
data at a central location, an admissible solution to the 
GMM problem occurs when the estimated parameters (given 
by Equations 2-4) become equal to the true parameters. 
However, for the distributed setup, since we are consider- 
ing a dynamic scenario, we relax this criteria and consider 
the solution to be admissible when it is within a user defined 
threshold e of its true value. For the monitoring problem, let 
ns, fC and C s denote the parameters that were calculated 
offline based on some past data, and disseminated to all the 
peers. The monitoring problem is to check if these param- 
eters tt) (lx 1 ), Jt s (dx 1 ) and C s (d x d) are valid with 
respect to the current data of all the peers. We use two dif- 
ferent metrics to perform this ( 1 ) monitor the log-likelihood 
of the data, and (2) monitor the parameters themselves. Be- 
low is a formal problem definition. 

Problem Definition: Given a time varying dataset Si, 
user defined thresholds ei, e-i, and 63 , and pre-computed 
estimates 0 = {wj, JJf s , C s , . . . }, for each gaussian, the mon- 
itoring problem is to check if: 

• £(©| Q)<e 

• |7T S — 7?^| < Cl 

• - iS|| < £2 

. |iic.h, - e;| <e 3 

for every gaussian s € { 1 , ..., fc}, where ||-|| F denotes the 
Frobenius norm of a matrix. In many cases, thresholding 
the log-likelihood of the data may be enough. However, 
there are situations where monitoring the parameters may 
prove beneficial, as we discuss later. 

4.4 Monitoring Functions in P2P Environment 

As a building block of PeDEM, we use an efficient, prov- 
ably correct, and local algorithm for monitoring functions of 
average vectors in R d , where the vectors are distributed in a 
P2P network. Here we present a brief summary; interested 
readers are referred to [3] [36] for details. 

Peers communicate with one another by sending sets of 
points in R d or statistics as defined later in this section. Let 
Xij denote the last sets of points sent by peer P, to Pj. As- 
suming reliable messaging, once a message is delivered both 
Pi and Pj know Xij and Xj,i. Below we present definitions 
of several sets which are crucial to the monitoring algorithm. 

Definition 4.1. The knowledge of Pi is the union of 
Si with Xjj for all Pj £ Tj and is denoted by fCi = Si U 

U x >p 

K.i can also be initialized using combinations of vectors de- 
fined on Si (instead of only Si) as we will present in the next 
section. 

Definition 4.2. The agreement of Pi and any of its 
neighbors Pj is Aij = Xij U Xj^. 

Definition 4.3. The subtraction of the agreement from 
the knowledge is the withheld knowledge of Pi with respect 
to a neighbor Pj i.e. Wij = K.i\ Ai,j . 


The next section presents a theorem which shows how 
we can convert this monitoring problem into a geometric 
problem for an efficient solution. For this we need to split 
the domain into convex regions since the stopping condition 
we describe later (Theorem 5.1) relies on this. The following 
definition states the properties of these convex regions. 

Definition 4.4. A collection of non- overlapping convex 
regions Up- = {Ri , R 2 , ■ ■ . , Re,T} is a cover of region R d , 
invariant with respect to a function P : R d — ► O ( where O is 
an arbitrary range), if (1) every Ri £ R-r (except T) is con- 
vex, (2) P is invariant in Ri i.e., V(x, y) £ Ri,P(x) = P(y), 
and (3) T denotes the area of the domain, not encompassed 
by Ui=i known as the tie region. 

Finally, for any ~x £ R d we denote Rr(~x) the first region 
of IZjr which includes ~x . The precise specification of the 
convex regions will depend on the definition of P. Monitor- 
ing of the GMM parameters will require us to invoke three 
separate monitoring problems with three separate convex 
regions as we show in Section 6. 

The goal is to monitor and compute mixture models de- 
fined on Q. Since Q is a hypothetical quantity, not available 
to any peer, each peer will estimate Q based on only the sets 
of vectors defined above. However, these sets can be large, 
thereby making communication expensive. Fortunately, un- 
der the assumption that communication takes place over a 
tree topology imposed on the network, it can be shown that 
the same sets can be represented uniquely by only two suf- 
ficient statistics which we define next. 

Set Statistics: For each set, define two statistics: (1) the 
average which is the avera g e of a ll the poin ts in the respec- 
tive sets (e.g. Si, K.i, Atj, Wi,j, X t j, Xj , < and Q), and (2) 
the weights of the sets denoted by tv (Si), w(2Qj), u>(Xj, i), 
w(/Ci), w(Ai,j), oj(Wij), and w(Q). Each peer communicates 
these two statistics for each set. We can write the following 
expressions for the weights and the average vectors of each 
set: 


Knowledge 

• w()Ci) = w(Si) + ^ w(Xj : i) 


r 4 


Fj-er, 


2 ^ u(jCi) 3 ’ 1 


Agreement 

• w(Ai,j) — Lv(Xi t j) -(- w(Xj t i) 

m ~a _ y , v 

Withheld 


• w(Wi,j) = w(K.i) — u}(Ai,j). 


Yj. . — “4*4) y. _ gdAjj) . 


Note that these computations are local to a peer. The gen- 
eral methodology for computing P(Q) requires us to cover 
the domain of P using non-overlapping convex regions. For 
the GMM, we show the convex regions that we need for 
monitoring the three parameters. 

Our next section presents a general criteria which a peer 
can use to decide the correctness of the solution based on 
only its local vectors. 



5. GLOBALLY CORRECT TERMINATION 
CRITERIA 

The goal of the monitoring algorithm is to raise a flag 
whenever the estimates of the parameters are no longer valid 
with respect to the union of all data i.e. Q. The EM moni- 
toring algorithm guarantees eventual correctness, which means 
that once computation terminates, each peer computes the 
correct result as compared to a centralized setting. The fol- 
lowing theorem allows a peer to stop sending messages and 
achieve a correct termination state i.e. if E(G) > e or < e 
solely based on K,i, Aij, and Wij. 



Theorem 5.1. [Termination Criteria] [36] Let Pi,..., P n 

be a set of peers connected to each other over a spanning tree 
G(V,E). LetQ, K.i, Aij, andWij be as defined in the pre- 
vious section. Let R be any region in TZp. If at time t no 
messages traverse the network, and for each Pi, K.i € R 
and for every Pj 6 Tj, Aij € R and either W tJ € R or 
Wi,j = 0, then E(G) € R. 

Proof. We omit the proof here. Interested readers are 
referred to [36]. □ 

The above theorem allows a peer to stop the communica- 
tion and output E(tCi) which will eventually become equal 
to T(G). A peer can avoid communication even if its local 
data changes or the network changes as long as the result of 
the theorem is satisfied. Indeed, if the result of the theorem 
holds for every peer, and all messages have been delivered, 
then Theorem 5.1 guarantees this is the correct solution. 
Otherwise, if there exists one peer P z for which the condi- 
tion does not hold, then either of the two things will happen: 
(1) a message will eventually be received by P z or, (2) P z 
will send a message. In either of these two cases, the knowl- 
edge K z will change thereby guaranteeing globally correct 
convergence. 

Using this Theorem, we now proceed to monitor each of 
the parameters of the GMM using the distributed EM. 


6. MONITORING GMM PARAMETERS 

In this section we present the monitoring of the log like- 
lihood of the data and the three parameters given in Equa- 
tions 7-9. 


6.1 Monitoring log likelihood 


6.2 Monitoring n s 

Monitoring tt s implies thresholding the absolute difference 
between the current n s (implied by the current data) and the 
calculated one tt s is with respect to a user defined constant 
d. Denoting this difference as Err(ir s ), we can write 


Err(% s ) 


|tt s - 7T S | < ei 


ELi rrn 

EjU ET=i [»■-■« 


ES=i rrn 



< 61 
I < ei 


This can be monitored using the framework presented in 
Section 4.4. Note that the quantity ^ <=1 E a =i [ g« .j,<» 


E ? = 1 m 4 


is the average of the estimates in the E-step ( Qi, s ,a — tt s ) 
across all the peers. Each peer subtracts tP s from each of its 


Figure 1: (A) the area inside an ci circle (B) A ran- 
dom vector (C) A tangent defining a half-space (D) 
The areas between the circle and the union of half- 
spaces are the tie areas. 


local 5i, s ,a- This forms the input Si for this monitoring prob- 
lem. However, due to the presence of the modulus operator, 
two concurrent monitoring problems need to be run instead 
of just one. Let these instances be and where 
Mf‘ and M%‘ are used for checking if Err(ir s ) < ei and 
—Err( 7 r s ) < ei respectively. Since this monitoring problem 
is in M, the convex regions are subsets of the real line. There- 
fore, for monitoring n s , the following initializations need to 
be carried out: 


• s .Si — rr a , - - • , (?i,s,Tre. TTs} 

• A ^2 * -i Si — {7T S Qi,s, 1 , - - * , TTg Qi,s,mi } 

• 7 Zr = {z 6 R : — 1 < z < ei} [J {z e R : ei < z < 1} 


6.3 Monitoring p s 

Following a similar argument, monitoring pi is equivalent 
to thresholding the following quantity: 


Err (J Jl) 



< 62 


Qi, 


The quantity 

i,a |^i, a p s 


Ei=l En=l Qi,s,q%i,a 


Ei=l £^1 9»,s,a 

Ei=l E„=l 9i,s,a l^i.a — P sj 


< 62 
2 

< 62 


ESU 

L L is the average of 

Z-i %= i E, a =i Qi,8,a 

across all the peers. However the average 


in this case is not taken with respect to the number of tu- 
ples in the dataset Si, but rather over all the qi, s ,a s. As a 
result, we set |5»| = E"=i Qi.s.a- Moreover, for this prob- 
lem, the geometric interpretation to the monitoring problem 
is to check if the L2-norm of the vector difference between 
pi — p s lies inside a circle of radius 62 . L2 norm threshold- 
ing of average data vector was first proposed in our earlier 
paper [36]. In R 2 , the problem can be depicted using Fig- 
ure 1. The area in which Err (pi) < 62 , is inside the circle 
(sphere) and hence is already convex in R 2 (R d ). However, 
the other region outside the circle (sphere) is not convex. 
Hence random tangent lines (planes) are drawn on the sur- 
face of the circle (sphere) by choosing points mT, . . . , uf on 



the circle (sphere) (the same points across all peers). Each 
of these half-space is convex. To check if an arbitrary point 
~z is inside the sphere, a peers simply checks if || z || < e 2 . 
To check if it is outside, a peer selects the first point Ui such 
that ~z ■ Ui > 62- The following denotes the initialization 
necessary for this instance of the problem M^ s : 


• Mi*°. Si = 


••j Qi,s,mi ft ^ 

• .Si = 

Eaii Its] 


V ,mi a 
z_, a=l "*><)(> 


• Hr = { ~z e R d : ||~?|| < e 2 } U[=i e R d : ~z ■ Ui > e 2 } 

' v. ' ' v ' 

R-in R\ 


6.4 Monitoring c s 

The last parameter that we need to monitor is the covari- 
ance matrix C s . A natural extension of the L 2 -norm in this 
case is the Frobenius norm. Let y t ' a = x lt ' a — JZ S . We have, 

Ew= 1 Eq=l Qi,s,a . 


C s = 


Si= 1 E a =l 9i,s,a 


(•T i,a s) 

t Vi, a. 1 yi,a.\yi,a.'2 


Si=l XE=1 Qi,s,a 


V 


2 

Vi,a.d 


E3i= 1 Ea=l Qi,s,a. 


Therefore, 
IIC.IIJ, = 


, p ^ mi \ 

i q=l Qi, 8 ,ayi,a.iyi,a .2 \ 

ELiE^i <?<,»,« J 
Ei=l Eq=l Qi,t>,ayi,a.d \ 
E?-iE£i«...« ) 

p 2 \ 2 

Ei=l Ea=l Qi,s,aVi,g.l \ 

y 

Ei=] La=l g«,s,a V <|a l 

Ef_iESi«.... 

Z-/ a=l 9i,8,a2/^ >a 

p 

Z^ i=i Z-/ a=l 9i,s,a 


Ej=l Eo^l < 74 , 3 , 004 , 0.2 


E? =1 E(T=1 {»?..■! + • • • + 


Ei=l E a =l Qi,s,a 


\ * / 

By taking the square root and re-substituting y^a, we get, 

Y'' P V'' m t „ Y~* d ... \ 2 


c? = 


ET=1 Eo=1 Qi,B,a Efc = l ( x i,a.k Ms-fc) 

ej-iE2i«.... 

E |=i { E ? = iESi (»«..■> - M.-*) 3 } 
EtiE2i«.... 

E2=l {E?=l E2 i «...«■?,.•* - »l k ELlE^l 
E£.iE!2i«...- 

E fc=l { E <=1 E a=l ^ 2 

Y^P Y' 1 m i n . Z-J ^8 

2-j i=l Z^ a=l 

d f V p V m * -r 2 

E l Z*ij= 1 Z^ 0=1 gt,s,o»Pi,oJ 

I V p V mi n ■ 

2-ji=i Z-j a=l 9t,s,a 


fc=l l 


fc=l 

E?= 


? =1 Erilgi,a,a^, a . fc \ 1 

EJ=lE2l / J 


Thus thresholding problem is to check if ||C S || F — C s < 63 i.e. 
||C S || F < C s + 63. As we show next, ||C S || F is not a convex 


function, but C" is. Thus we monitor the latter one instead. 
Note that, C" < £3 => ||C S || F < 63. However, the other side 
of the inequality is not true, i.e. C" > 63 # |C S || F > 
£3. Thus using C" for thresholding is more conservative: in 
the worst case we will have more number of false alerts for 
building new models, but will not miss any alert. 

Let Err(Cs) = C s " - C s . Let g : R 2d -*■ R be defined 
as follows: V(si, . . . , s 2d ) e R, g (s 1, . . - , s 2d ) = ^<Li s « _ 
J 2 t=d-i s? — C s — e 3. We have the following key result: 

Err(C s n ) < e 3 « 


E^i=l E^a=l 9i,s,a ( x i,a . lj ■ ■ ■ > x i ,a.di x i,a.l, ■ ■ ■ , Xi,a.d) 
Si=l Ea^l 9i,s,a 


< 0. 


Each peer can locally compute the 2 -d dimensional vector 
qi.s.a ( x i,a.i, ■■■, x i,a.d , x i,a.i, ■■■, Xi^.d) . Then the goal boils 
down to zero-thresholding g applied to the average of local 
vectors. The last thing to prove is that g (or —g) is a convex 
function. Taking the Hessian of — g it can be easily shown 
that — g is convex. We can therefore apply our tangent line 
technique for monitoring Err(C "). 

Note that the inside of g is already convex. The outside 
can be decomposed into convex regions using tangent lines 
placed at random locations on g. For a 2 -dimensional case, 
Figure 2 shows the function (a parabola) and the possible 
tangent lines. Let uT, . . . , ui be points on the parabola which 
define tangent lines. Checking if g(fCi) < £3 is equivalent to 
checking if Allies inside g. If not, we find the first point 
Ui such that fCi ■ Ui > ||u{||. We then apply the theorem for 
half space defined by Ui . 

Now since Err(C ") can be both positive or negative, we 
need to check if |jFtt(C”)| < £3. Therefore, we need two 
monitoring instances denoted by M°‘ and . The fol- 
lowing denotes the datasets and convex regions for this mon- 
itoring problem. 


• Afj 8 .Si — { (xi r l.l) ■ ■ ■ , Xi'i d) Xi t 1 . 1 , ■ • • , >*■■)} 

_ TurCs TT H„=i 9t,8,a( x i,o.l x i,a.d’ x i,a.l x i,a.l) 

• lVli .Oi — , 

2-t a= i a 

M° 8 .u;{Si) = Z7=i ®,.,» 

• M^.Si = M% 8 .u{Si) = 'Zah 

• 1 Z?- = {~z € R 2d : g(z ) < 0 } U* = i{"^ £ ® 2d ‘-~z -Ui> 

11*11} 

6.5 Algorithm 

Having discussed each of the monitoring problems, we are 
now in a position to present the algorithms for monitoring 
the parameters. For each gaussian s € { 1 , . . . , k}, we need 
to run the following monitoring problems separately: 

• one for 7T S 

• one for JOl 

• one for C" 

In order to use Theorem 5.1 for developing a monitoring 
algorithm, the following steps must be followed: 

1 . Specify the input to the algorithm (i.e. Si) 






Figure 2: (A) the area inside a parabola (B) The 
area covered by the half-space (C) A tangent defin- 
ing a half-space. 


Input: 62, Hr, Si, Tj, L and pj 
Output: Set 

if \\M»°Xi\\ > 62 
otherwise 
Initialization: Initialize 
On MessageRecvd(X, w(X)) from Pj 


flag = 


oei 

G 


M^.Xi 


X: 


M»°.u (Xj ti ) *-u(X)-, 


Update vectors; 

On any Event: 

Call ProcessEvent(M Ms ,Hr, Ti, L, LastMsgSent ) ; 


Algorithm 2: Pseudo code for monitoring monitoring 
fT s for any peer Pj. 


2. Specify the cover i.e. Hr 

For each of the monitoring problem, these are already speci- 
fied in the previous sections. Algorithm 1 , 2 , 3 and 4 present 
the pseudo-code. We describe the algorithm with respect to 
monitoring tt s only. The other two are almost identical. 


Input: ei, Hr, Si, I\, L and ir s 

Output: Set 

f lag *° = J 1 if M '‘ > €l V M%- Xi > 61 

1 0 otherwise 
Initialization: 

• Generate qi, a ,<y(xi,a € Si) using Equation 6 

• Initialize two monitoring instances and M%‘ 

On MessageRecvd(X,w(X), id) from Pj 
M^.X~^X- 
M%.<j(x j ,i)^<j(xy, 

Update vectors; 

On any Event: 

Call ProcessEvent(M 1 7r " ,Hr, Tj, L, LastMsgSent); 
Call ProcessEvent(M^ s ,Hr, Fj, L, LastMsgSent)', 
Algorithm 1 : Pseudo code for monitoring tv s for any 
peer Pi. 


For any peer Pi, the input to the algorithm are e, Hr, 
Si, Tj, L and tt s (we describe L later). The output for each 
of the monitoring instance is a flag which is set if the cor- 
responding Xi exceeds the threshold. In the initialization 
phase, it initializes its local statistics Xi, Ai,j and Wij ac- 
cording to the equations in Section 4.4. The algorithm is 
entirely event driven. Events can be one of the following: 
a change in local data Si, message received or a change in 
the set of neighbors Tj. If one of these things happen, a 
peer calls the ProcessEvent method (Algorithm 4). The 
goal of this method is to make sure that the conditions of 
Theorem 5.1 are satisfied by the peer which runs it. First 
peer Pi finds the active region: the region R £ Hr in which 
Xi lies i.e. R = Hr(Xi). If, R — T, i.e. the knowledge 
lies in the tie region, the condition of the theorem does not 
guarantee a solution and hence the only correct solution is 
flooding all of its data. On the otherhand, if for all Pj £ Tj, 
both A,./ £ R and Wi,j £ R, Pi does nothing and can rely 
on the result of the theorem for correctness. If Ai,j £ R or 


^ R, the result of the theorem dictates P t to send a 
message to Pj. Other than these two cases, a peer need not 
send any message even if its local data has changed. 


Input: 63, Hr, Si, Tj, L and C s 

Output: Set 

flag c, = f 1 if 9 {M?‘ Xi)>Q \/g (M?‘ .Xi) > 0 
1 0 otherwise 

Initialization: Initialize two monitoring instances M°‘ 
and M%‘ . 

On MessageRecvd(X, u>(X),id) from Pj 
M a ‘.X~i <-X; 

M°‘m (Xj,i) <- u>(X); 

Update vectors; 

On any Event: 

Call ProcessEvent(M' 7 " ,Hr, r», L, LastMsgSent)', 
Algorithm 3: Pseudo code for monitoring monitoring 
C" for any peer Pi. 


Function ProcessEvent(M,7?.^, Tj, L, LastMsgSent) 
begin 

forall Pj £ r, do 

if Hr ( M.Xi ) = T then 

Mm ( Xi,j ) <- Mm (Xi) - Mm ( X jti ); 

M.u,(IC j )M.q-M.u(x jii )M.fy^ 

M. w (x j , < ) ; 

end 

if (M.A~j £ Hr (M.Xi)) 

V (M.W^j & Hr (M.Xi)) 

V (M.u ( Wi,j ) = 0 A M.A~j / M.Xi) then 

Compute new Mm ( Xj t i ) and M.Xjj such 
that MlAjli, M.Wj,i £ Hr(MXt) 
end 

if CurrTime — LastMsgSent > L then 
SendMsg(M.Xj i j, M.w(Xjj), id) to Pj 

end 

else Wait (L — ( CurrTime — LastMsgSent)) 
units and check again 
end 

end 

Algorithm 4: Procedure for handling an event. 

Message sending is performed in the ProcessEvent method 






itself. When R = T, the peer has to flood whatever knowl- 
edge it has. Thus it sets Xij and cj(Xij) equals to its 
knowledge minus what it had received from Pj previously. 
It then sends this to Pj. However, when R ^ T, a peer 
can refrain from sending all data. As shown by Wolff et al. 
[36] and Bhaduri et al. [4], this technique of sending all the 
data has adverse effects on the communication in a dynamic 
data scenario. This is because if a peer communicates all of 
its data, and the data changes again later, the change is far 
more noisy than the original data. So we always set X li3 and 
\Xij\ such that some data is retained while still maintain- 
ing the conditions of the theorem. We do this by checking 
with an exponentially decreasing set of values of |Wi,j | until 
either all fCi, Ai,j and Wi,j G R, or |Wj,j|=0. If the lat- 
ter happens, then there exist no condition for which a peer 
can have witheld data and it has to send everything. The 
conditions stated in the ProcessEvent method are exhaus- 
tive; a peer only sends a message if one of these conditions 
are violated. This guarantees eventual correctness based on 
the theorem. Similarly, whenever it receives a message (X 
and |X|), it sets Xj t i <— X and |Xj,i| <— |X| and calls the 
ProcessEvent method again. 

To prevent message explosion, in our event-based system 
we employ a “leaky bucket” mechanism which ensures that 
no two messages are sent in a period shorter than a constant 
L. This technique is not new but has been used earlier 
[4] [36]. The basic idea is to maintain a timer. Whenever 
a peer sends a message, the timer is started. If later, the 
peer wants to send another message, it checks if L time units 
has passed since the timer was started. If yes, it sends the 
message and resets the timer to reflect the current time. 
Otherwise, the peer waits time difference between L and 
the ‘timer’ time. When the timer expires, the peer checks 
the conditions for sending messages and decides accordingly. 
Note that this mechanism does not enforce synchronization 
or affect correctness; at most it might delay convergence. 
We explore its effect thoroughly in our experiments. 

In the next section we describe how we can use this moni- 
toring algorithm to update the models, if they are outdated. 

7. COMPUTING ^NEW JMODELS 

Once the models (tt s , fi s and C s ) are monitored to within 
e of their true values, the next step is to rebuild the models if 
they are found outdated. The monitoring algorithm present 
in the previous section generates an alert whenever one of 
the following occurs for any s G {l...fc}: 

• |tts - ?£| > ei 

• ||/lj - 7?»|| > €2 

• |c?-C s |>e 3 

Building global models in a distributed environment is com- 
munication intensive. Here we rely on the outcome of our 
correct and efficient local algorithm to generate a trigger 
dictating the need for re-building the model. Given enough 
time to converge, the correctness of our monitoring algo- 
rithm ensures that even simple techniques such as best-effort 
sampling from the network may be sufficient to produce 
good results. If it does not, the underlying monitoring al- 
gorithm would eventually indicate that and a new model 
building will be triggered. 


The idea of model computation in the network is very 
simple. Peers engage in a convergecast-broadcast process. 
The monitoring algorithm can be viewed as a flag which 
is raised whenever the model is misfit with respect to the 
global data. If this happens for any peer, it does the fol- 
lowing. First it waits for a specific amount of time which 
we call the alert mitigation time r to see if the alert is in- 
deed due to a data change or random noise. If the alert 
exists even after r units of time, the peer checks if it has 
received data from all its neighbors except one. If yes, it 
generates a sample of user-defined size B from its own data 
and each of its children weighing each point inversely as the 
size of its subtree such that each point has an equal chance 
of being included in the sample. It then sends the sample 
to its parent and marks its state as convergecast. Whenever 
a peer receives data from all peers, it becomes the root of 
the convergecast and employs a centralized EM algorithm 
to build new model parameters. It then sends these mod- 
els to itself and marks its state to broadcast. Whenever a 
peer gets new models it forwards the models to all its neigh- 
bors (except the one from which it received) and moves from 
convergecast to broadcast phase. Because we do not impose 
the root of the tree, it may so happen that two peers get 
all the data simultaneously. We break the tie in such sce- 
narios using the id of the nodes. Only the peer with the 
highest id is allowed to propagate the model in the net- 
work. Algorithm 5 presents the pseudo code of the overall 
EM algorithm. As shown, there are three types of messages: 
Monitoring Msg, Pattern Msg and Dataset M sg. The 
monitoring message is passed to the underlying monitoring 
algorithm. The pattern message is received as part of the 
broadcast round while the datasets are received when the 
peer engages in convergeast round. 

8. ANALYSIS OF ALGORITHMS 

In this section we prove that (1) the PeDEM algorithm is 
correct, and (2) local. 

Lemma 8.1. PeDEM is eventually correct. 

Proof. In termination state, i.e. when all nodes in the 
network have stopped sending messages and there are no 
messages in transit, the knowledge fCi of each peer Pi will 
converge to one of these states: (1) fCi = Q, or (2) K.i, Aij, 
and Wij are in the same R G IZr for every neighbor Pj. 
In the first case, Ki = Q => F(ICi) = F{Q). In the second 
case, by Theorem 5.1, K.i, Ai,j, and Wij G R =$> Q G R. By 
Definition 4.4, T is invariant in R and hence F{Q) = F(K.i). 
Thus the PeDEM algorithm is eventually correct. □ 

Lemma 8.2. PeDEM is local. 

9. EXPERIMENTAL RESULTS 

In this section we demonstrate the performance of the Pe- 
DEM algorithm on synthetic dataset. Our implementation 
of the algorithm was done in Java using the DDMT 2 toolkit 
developed at UMBC. For the topology, we used the BRITE 
topology generator 3 . We experimented with the Barabasi 
Albert (BA) model since it generates realistic edge delays (in 
millisec), thereby simulating the internet. We convert the 

2 http : / /www . umbc . edu/ ddm/Sf tware/DDMT/ 

3 http : / /www . cs.bu.edu/brite/ 



Input: ci, 62, 63, Bp, Si, r<, L, nt, jit, C s and r 
Output: New model such that error is less than 
threshold 

Initialization: Initialize vectors; Set 
LastDataAlert <— oo ;Datasent <— false; 

On Receiving a message: 

MsgType, RecvcLPj *— MessageRecvdFrom(P, ) 
if MsgType = Monitoring-Msg then 
Pass Message to Monitoring Algorithm; 
end 

if MsgType = PatternJMsg then 
Update models; 

Forward new models to all neighbors; 

Datasent = false; 

Restart Monitoring Algorithm with new models; 

end 

if MsgType = Dataset-M sg then 
NumRecvd = Count num recvdQ; 

Recvd-Dataset = Recvd-Dataset |J Recvd-Pj ; 
if NumRecvd=Ti — 1 then 

f lag =Output Monitoring Algorithm)); 
if Datasent = false f\ flag = 1 then 

if CurrTime — LastDataAlert > t then 
D= Sample (Si, Recvd-Dataset, B); 
Datasent = true; 

Send D to remaining neighbor; 
end 

else LastDataAlert= CurrTime; 

Check again in r time; 

end 

if flag=0 then LastDataAlert «— oo 
end 

if NumRecvd=Ti then 

U=Sample(S l , Recvd-Dataset, B); 
NewModel=EM(D); 

Forward NewModel to all neighbors; 

Datasent= false; 

Restart Monitoring Algorithm with NewModel; 

end 

end 

if Si, Tj or )Ci changes then 
Run Monitoring Algorithm; 
jfia<?=Output Monitoring Algorithm)) ; 
if flag=l and Pj =IsLeaf() then 
Execute the same conditions as 
Msg Type =Datas eLMsg; 
end 
end 

Algorithm 5: P2P EM Algorithm. 


edge delays to simulator ticks for time measurement since 
wall time is meaningless when simulating thousands of com- 
puters on a single PC. On top of each network generated by 
BRITE, we overlay a communication tree. 

9.1 Data Generation 

The input data of a peer is a set of vectors in R d generated 
according to multi-dimensional GMM. More specifically, for 
a given experiment, we fix the number of gaussians k, their 
means jut, ...,jlk and covariance matrices Ci, . . . , C*, and 
also the mixing probabilities iri , . . . , tt*, . Every time a sim- 
ulated peer needs an additional data point, it first selects a 


gaussian s with corresponding probability tt s and then gen- 
erates a gaussian vector in R d with mean and covariance 
jlt,C s . The means and the covariances are changed ran- 
domly at controlled intervals to create an epoch change. 

9.2 Measurement Metric 

In our experiments, the two most important parameters 
for measurement are the quality of the result and the cost 
of the algorithm. 

For the regression monitoring algorithm, quality is mea- 
sured in terms of the percentage of peers which correctly 
compute an alert, i.e., the number of peers which report 
that ICi < 6 when £° < e and similarly tC, > e when £ g > e. 
We also report the overall quality which is average of the 
qualities for both less than and greater than e and hence lies 
in between those two. Moreover, for each quality graph in 
Figures ??, ??, ??, ??, ?? and ?? we report two quantities 
— (1) the average quality over all peers, all epochs and 10 
independent trials (the center markers) and (2) the standard 
deviation over 10 independent trials (error bars). For the re- 
gression computation algorithm, quality is defined as the L2 
norm distance between the solution of our algorithm and the 
actual regression weights. We compare this to a centralized 
algorithm having access to all of the data. 

We refer to the cost of the algorithm as the number of 
normalized messages sent, which is the number of messages 
sent by each peer per unit of leaky bucket L. Hence, 0.1 
normalized messages means that nine out of ten times the 
algorithm manages to avoid sending a message. We report 
both overall cost and the monitoring cost (stationary cost), 
which refers to the “wasted effort” of the algorithm. We also 
report, where appropriate, messages required for converge- 
cast and broadcast of the model. 

9.3 Typical Experiments 

A typical experiment is shown in Figure ??. In all the 
experiments, about 4% of the data of each peer is changed 
every 1000 simulator ticks. Moreover, after every 5 x 10 s 
simulator ticks, the data distribution is changed. There- 
fore there are two levels of data change — (1) every 1000 
simulator ticks we sample 4% of new data from the same 
distribution (stationary change) and (2) every 5 x 10 s clock 
ticks we change the distribution (non-stationary change). 
To start with, every peer is supplied the same regression 
coefficients as the coefficients of the data generator. Figure 
?? shows that for the first epoch, the quality is very high 
(nearly 96%). After 5 x 10* simulator ticks, we change the 
weights of the generator without changing the coefficients 
given to each peer. Therefore the percentage of peers re- 
porting ICi < e drops to 0. For the cost, Figure ?? shows 
that the monitoring cost is low throughout the experiment 
if we ignore the transitional effects. 

9.4 Results: Regression Monitoring 

9.5 Results: Regression Models 

10. CONCLUSION 
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Figure 3: Plot of typical experiments. Each experiment is repeated for several epochs. Quality is measured 
both inside and outside e. Cost is measured during the entire experiment and during stationary phases. Last 
80% of the time refers to stationary phase to ignore transitional effects. 
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Figure 4: Variation of the quality and cost of the monitoring algorithm on the different parameters. We 
have separated the quality of the log-likelihood, mean and covariance monitoring algorithm. We have also 
separated the cost of the entire experiment and during stationary phase. 
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Figure 5: Scalability of the monitoring algorithm with respect to number of peers, dimension of each gaussian, 
and number of gaussians. 
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